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One-dimensional arrays of nucleosomes (DNA-bound histone octamers separated by stretches 
of linker DNA) fold into higher-order chromatin structures which ultimately make up eukaryotic 
chromosomes. Chromatin structure formation leads to 10 — 11 base pair (bp) discretization of linker 
lengths caused by the smaller free energy cost of packaging nucleosomes into regular chromatin fibers 
if their rotational setting (defined by the DNA helical twist) is conserved. We describe nucleosome 
positions along the fiber using a thermodynamic model of finite-size particles with both intrinsic 
histone-DNA interactions and an effective two-body potential. We infer one- and two-body energies 
directly from high-throughput maps of nucleosome positions. We show that chromatin structure 
explains in vitro and in vivo nucleosome ordering in transcribed regions, and plays a leading role in 
establishing well-known 10 — 11 bp genome- wide periodicity of nucleosome positions. 



PACS numbers: 87.18.Wd, 87.80.St, 05.20.Jj 

In living cells, eukaryotic DNA is found in a compact, 
multi-scale chromatin state pQ. The fundamental unit 
of chromatin is a nucleosome: 147 bp of DNA wrapped 
around a histone octamer [2]. In addition to its primary 
function of DNA compaction, chromatin modulates DNA 
accessibility to transcription factors and other molecular 
machines in response to external signals, exerting a pro- 
found influence on numerous DNA-mediated biological 
processes such as gene transcription, DNA repair, and 
replication [3]. 

Equilibrium thermodynamic models that account for 
intrinsic histone-DNA sequence preferences and nearest- 
neighbor steric exclusion have been used to predict nu- 
cleosome positions and formation energies [3H6]. How- 
ever, structural regularity of the chromatin fiber im- 
poses additional constraints, leading to discretization of 
linker lengths between neighboring nucleosomes with the 
10-11 bp periodicity of the DNA double helix [3 [8]. The 
discretization is required to avoid steric clashes caused by 
the nucleosome rotating around the linker DNA axis as 
the linker is extended [9 , and more generally to mini- 
mize the free energy costs associated with maintaining a 
regular pattern of protein-protein and protein-DNA con- 
tacts in the chromatin fiber [8]. Indeed, adding a short 
DNA segment to the linker will result in a rotation of 
the nucleosome with respect to the rest of the fiber, dis- 
rupting its periodic structure. This additional twist has 
to be compensated unless the segment is 10 — 11 bp in 
length, bringing the nucleosome into an equivalent rota- 
tional position. 

Large-scale maps of in vivo and in vitro nucleosome 
positions in yeast reveal nucleosome-depleted regions 
(NDRs) in the vicinity of transcription start and termi- 
nation sites (TSS and TTS) EB E]. In these ex- 
periments, chromatin is digested with micrococcal nu- 
clease to obtain mononucleosome core particles, and the 



mononucleosomal DNA is purified and either sequenced 
or hybridized to microarrays [12] . 5' NDRs play a key role 
in gene regulation [10 . NDRs are also observed in vitro, 
where they are defined by poly(dA:dT) tracts and other 
nucleosome-disfavoring sequences. Surprisingly, there are 
no oscillations in nucleosome occupancy around in vitro 
NDRs and, on average, just a ~ 25% depletion of the oc- 
cupancy over 5' NDRs compared with the genome- wide 
mean [5j [11] (bp occupancy is defined as its probabil- 
ity to be nucleosome-covered). This is true even if ge- 
nomic DNA from S.cerevisiae is combined with purified 
hist ones in a 1:1 mass ratio, yielding a maximum nu- 
cleosome occupancy of 0.82 which is close to the in vivo 
value [11 . This behavior is in sharp contrast with in vivo 
chromatin in which the action of transcription factors, 
chromatin remodeling enzymes and components of tran- 
scriptional machinery results in well-positioned genie nu- 
cleosomes and highly pronounced 5' NDRs (~ 70% de- 
pletion on average with respect to the mean) [5j [10] . Be- 
cause occupancy oscillations are a generic feature of one- 
dimensional liquids of finite-size particles in the vicin- 
ity of potential barriers and wells [13], the absence of 
such oscillations in vitro and shallow NDRs strongly sug- 
gest that sequence-specific histone-DNA interaction en- 
ergies are on average comparable to /c#T. Consequently, 
nucleosome-positioning and disfavoring sequences are ex- 
pected to play a minor role in establishing in vivo local- 
ization of genie nucleosomes. 

Here we focus on how nucleosome positions are affected 
by effective two-body interactions imposed on neighbor- 
ing particles by regular chromatin structure. We map a 
three-dimensional chromatin fiber onto a system of non- 
overlapping particles of length a = 147 bp with both 
histone-DNA and short-range nearest-neighbor interac- 
tions. The particles are confined to a one-dimensional 
lattice of length L. We develop a theory in which the 
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FIG. 1: (Color online) A model with 10 bp oscillations in 
both one-body and two-body energies. The two-body in- 
teraction is <3>(x) = A cos (ffx) e~ x/b , where A = 5 ksT 
and b = 50 bp. For the one-body potential, 10 bp oscilla- 
tions with the 0.5 ksT amplitude were superimposed onto 
a smooth energy profile with two —5 ksT potential wells 
separated by 1000 bp. DNA length of 2416 bp was chosen 
to be able to position 16 nucleosomes with 151 bp repeat 
length. The occupancy profile (a), the linker length distribu- 
tion (b), the one-body energy (c), and the two-body interac- 
tion (d): exact (solid blue line) and predicted (dashed green 
line), fi — (u) — — 1 ksT in (a)-(d). Inset of (a): the prob- 
ability of starting a nucleosome at a given bp. (e) Average 
number of nucleosomes (N to t) vs. /jl—{u). Insets: Occupancy 
profiles corresponding to three different chemical potentials, 
(f) Linker length distributions for three values of (Ntot) shown 
as points in (e), with and without two-body interactions. 



interaction (that reflects linker discretization) is deduced 
exactly from the two-particle distribution, even in the 
presence of 10 — 11 bp periodic one-body energies related 
to the rotational positioning of the nucleosome [TIB [11] . 

Let u(k) be the external potential energy of a par- 
ticle that occupies positions k through k + a — 1 on the 



DNA, and let $(fc, I) be the two-body interaction between 
a pair of nearest-neighbor particles with starting posi- 
tions k and I, respectively. Here u(k) describes intrinsic 
histone-DNA interactions, whereas 3>(fc,Z) accounts for 
the effects of chromatin structure. The grand-canonical 
partition function is given by 

N max 

Z = 1 + J2 ( J \( zw ) N ~ lz \ J ) = 1 + (J\(I- zw^zlJ), 

N=l 

(i) 

where N max is the maximum number of particles that 
can be positioned on L bp, I is the identity matrix, \j) 
is a unit vector of dimension L — a + 1 with 1 at position 
j, and | J) = ^2^=i +1 \j). In matrix notation, (k\z\l) = 
e Pb*-u(k)]g k j and ( fe |^ = e -WM)0(/ _ w here /i 
is the chemical potential, Sk,i is the Kronecker delta, /3 
is the inverse temperature, and 6 is the Heaviside step 
function. 

The one-particle and nearest-neighbor pair distribu- 
tion functions are: 

n(i) = |(J|(/-^)- 1 |i)(i|^|i)(i|(/- U ;z)- 1 |J), 

(2) 

n 2 (hj) = \W ~ zwr^WzwzWtiW - wz)- l \J). 

(3) 

Note that for < j — i < 2a, 712(2, j) = ri2 
where ri2 is the ordinary two-particle distribution func- 
tion. Defining two matrices, (i\N\j) = n(i)5ij and 
(i\N2\j) = n2^, j), we rewrite the partition function as 



Z = 



1-(J\(I-N 2 N- 1 )N\J)' 



(4) 



By inverting Eqs. Q and (|3| we obtain the exact ex- 
pressions for one- and two-body energies [HI [15] : 



( (J\I-N 2 N^\k){k\N\k){k\I-N^N 2 \J) \ 
n , / (k\N^N 2 N^\l) [1 - (JjJ - N 2 N~ 1 N\J}] \ 



Note that if the two-body interactions are neglected, 
Eq. (|5| reduces to [5] 

-j8[«(0- M ] = n(i) 1 - O(j) + n(j) 

l-0(i) + n(i) 11 l~0(j) ' 

3 — * 

(7) 



where 0(i) is the nucleosome occupancy of bp i [0(i) = 

If one-body energies u and two-body interactions $ 
are known, Eqs. ([2| and ([3| allow us to construct par- 
ticle distributions n and rt2 exactly. Conversely, we 
can use Eqs. ([5| and (|6| to find u and from one- 
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and two-particle distributions. However, the two-particle 
distribution is not directly measured in current high- 
throughput experiments, in which chromatin from many 
cells is mixed together before mononucleosomes are iso- 
lated and sequenced. In other words, it is not known 
which particular genome a given nucleosome comes from. 
This is irrelevant for n but may present a problem for ~n~2 , 
which requires two-nucleosome configurations. Nonethe- 
less, we can build a model for TL2 which allows us to ap- 
proximate the two-body interaction. 

Let g(i,j) be the pair distribution 77,2(2, j)/[n(i)n(j)]. 
Without one-body energies, the system is homogeneous 
and g is a function of only the relative distance between 
the nucleosomes: g(i,j) = g(j — i). In this case Eq. (|6| 
reduces to 

- j) = In [g(j - i)] + a(j -i) + \nC (8) 

for arbitrary interactions 3> [16]. The constants C and 
a can be determined from the asymptotic condition 
limy-ij^oo 3>(i, j) = 0. However, position-dependent 
one-body energies break translational invariance of the 
pair distribution g. Assuming that <E> is translationally 
invariant, we introduce Pimker (A) = (g(i,i + A + a))i and 
approximate <3> as 

-P$(i,j) « m[P linker (j - (i + a))]+a(j-z)+lnC. (9) 

This step is reminiscent of replacing the ensemble average 
with the time average in statistical mechanics. Our nu- 
merical tests show this to be an excellent approximation, 
even if one- and two-body energies are comparable in 
magnitude, making the system strongly inhomogeneous. 

Experimental nucleosome positioning data sets consist 
of the histogram of the number of nucleosomes starting 
at each genomic bp i. We preprocess the data by re- 
moving all counts of height 1 from the histogram, and 
smoothing the remaining profile with a a = 2 Gaussian 
kernel. Next, we compute n(i) by rescaling the smoothed 
profile so that the maximum occupancy for each chromo- 
some is 1. Finally, we identify all local maxima on the n 
profile and assume that they mark prevalent nucleosome 
positions. For each maximum at bp i we find subsequent 
maxima at positions i + 146 < ji < J2 < J3 < • • • in the 
50 bp window. To each pair of maxima (i, ji ) , (i, J2), • • • 
we assign the probability that they represent neighbor- 
ing nucleosomes: n(i)n(ji), n(i)[l — n(ji)]n(j2), • • • By 
summing over all initial positions i and normalizing, we 
obtain the linker length probability which gives us an 
empirical estimate of Pimker- 

Fig. [T] demonstrates our procedure in a model system, 
with preprocessing and rescaling steps skipped since the 
simulated n profile is noise-free and already properly nor- 
malized. Specifically, we use local maxima in the nucle- 
osome starting probability profile [inset of Fig. [TJa)] to 
obtain Pimker [Fig. [TJb)] . Fig. [TJd) shows that the two- 
body interaction can be reconstructed using Eq. even 
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FIG. 2: (Color online) (a) Two-body interaction <E> inferred 
from in vitro maps of nucleosome positions [5j [TI]. Grey 
bars indicate consensus positions of the minima, (b) Au- 
tocorrelation of nucleosome starting positions in one of the 
in vitro data sets [11], and of starting positions predicted us- 
ing sequence-specific one-body energies from the "spatially 
resolved" model [6], with and without The two-body po- 
tential is from Fig. [I] consistent with the minima of $ ob- 
served in (a). The one-body energies have a — 0.23 ksT. To 
account for the limited size of the in vitro data set, model 
output was degraded by randomly removing 1% of predicted 
nucleosome probabilities. 

in the presence of one-body energies with the same peri- 
odicity. The reconstruction is facilitated by the presence 
of potential wells or barriers in the one-body energy pro- 
file that are strong enough to create non-uniform density 
of nearby nucleosomes. To find the one-body energies, 
we substitute predicted $ into Eq. which we solve 
numerically for z [Fig.JTJc)]. Nucleosome occupancies in- 
ferred from predicted u and § are virtually identical to 
the exact profile [Fig. [TJa)]. 

As the chemical potential is increased, nucleosomes un- 
dergo a transition in which their average number goes up 
in a step-like fashion [Fig. |^e)] [17] . In contrast to the 
$ = case in which linkers are distributed exponentially, 
two-body interactions lead to the pronounced discretiza- 
tion of linker lengths [Fig. [ljf )] . The first minimum of $ 
becomes more dominant as the number of nucleosomes 
increases, leading to a well-positioned array with 4 bp- 
long linkers. 

We now use Eq. ([9| to predict nearest-neighbor interac- 
tions from genome- wide nucleosome maps [Fig.^a)]. We 
find that despite significant experiment-to-experiment 
variations, all two-body potentials have minima within 
1 — 2 bp of 5 + 10m bp, m = 0, 1, . . . [18]. Surprisingly, 
there are substantial differences between two Kaplan et 
al. [5] in vitro replicates, with one replicate exhibiting 
higher values of $ due to the pronounced depletion of nu- 
cleosomes separated by < 10 bp. Apparently, chromatin 
structure can undergo subtle uncontrolled changes from 
experiment to experiment. 
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FIG. 3: (Color online) A minimal model of nucleosome or- 
dering in genie regions, (a) Dashed red and dotted orange 
lines: average nucleosome occupancy in vitro around TSS 
and TTS [TJ. Solid blue and dash-dot black lines: model 
predictions with and without <I> from Fig. [I] Both models 
have the average occupancy of 0.60 (less than the maximum 
possible occupancy of 0.82 because some histone octamers 
are not DNA-bound). Inset: one-body energy landscape with 
barrier heights, widths and shapes adjusted to reproduce ob- 
served NDRs. (b) Same as (a), for in vivo nucleosomes (YPD 
medium) [19]. $ is from Fig. [I] with A = 7 k B T. The log- 
intensities from the microarray were exponentiated and nor- 
malized separately for each gene, yielding the average occu- 
pancy of 0.70 which was also used in the models. 



the +1 nucleosome [Fig. [3jb)]. The in vivo barriers are 
more pronounced to account for additional nucleosome 
depletion in the NDRs due to effects other than intrin- 
sic histone-DNA interactions. Finally, in agreement with 
a previous hypothesis [11 , a potential well is added to 
localize the +1 nucleosome in vivo. The well makes the 
TSS profile asymmetric with respect to the center of the 
NDR [compare to the more symmetric TTS profile in 
Fig.§b)]. 

In summary, our study is the first to show that short- 
range two-body interactions induced by chromatin fiber 
formation play a major role in genome- wide nucleosome 
ordering. We demonstrate that large-scale mononucleo- 
some maps contain evidence of the two-body potential. 
This potential is more important than intrinsic histone- 
DNA interactions for predicting 10 — 11 bp periodicity 
in genome- wide nucleosome positions, and for under- 
standing nucleosome occupancy in transcribed regions. 
Clearly, two-body interactions should be an integral part 
of genome-wide models of nucleosome occupancy. Our 
study also underscores the need for future experiments 
focused on multi- nucleosome distributions, which can be 
analyzed using our exact theory [Eqs. ([5| and 

This research was supported by National Institutes of 
Health (HG 004708) and by an Alfred P. Sloan Research 
Fellowship to AVM. 



Two-body interactions are reflected in the autocor- 
relation of nucleosome starting positions [Fig. [2^b)] . 
The oscillations in the autocorrelation function are sup- 
pressed when nucleosome positions are predicted using a 
sequence-specific model which neglects two-body interac- 
tions [6 j. This "spatially resolved" model assigns mono- 
and dinucleotide energies independently at each position 
within the nucleosomal site and is thus capable of captur- 
ing the 10 — 11 bp periodicity of one-body interactions. 
We find that the autocorrelation function is much closer 
to experiment if the two-body potential is included into 
the model [Fig.gb)]. 

Two-body interactions are also essential for recon- 
structing nucleosome occupancy profiles over transcribed 
regions [Fig. [3j. Sequence-specific energy barriers over 
NDRs must be low in vitro to account for the lack of 
occupancy oscillations induced by steric exclusion at 1:1 
DNA:histone mass ratio [11]. Even with the low barri- 
ers shown in Fig. [3ja), the interaction-free model yields 
an oscillatory profile which is not observed in the data. 
The oscillations are suppressed by the two-body poten- 
tial, and the resulting profile increases towards the center 
of the gene, in contrast with the pure steric exclusion sce- 
nario in which nucleosomes adjacent to the barriers are 
always the most localized [T3] , This behavior is also ob- 
served in vivo where the +2 nucleosome is higher than 
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